Bit Transposed Files
نویسندگان
چکیده
1. Introduction and Motivation Conventional access methods cannot be effectively used in large Scientific/Statistical Database (SSDB) applications. A file structure (called bit transposed file) is proposed which offers several attractive features that are better suited for the special characteristics that SSDBs exhibit. This file structure is an extreme version of the (attribute) transposed file. The data is stored by vertical bit partitions. The bit patterns of attributes are assigned using one of several data encoding methods. Each of these encoding methods is appropriate for different query types. The bit partitions can also be compressed using a version of the run length encoding scheme. Efficient operators on compressed bit vectors have been developed and form the basis of a query language. In addition to selective power with low overhead for SSDBs, the bit transposed file is also amenable to special parallel hardware. Results from experiments with the file structure suggest that this approach may be a reasonable alternative file structure for large SSDBs. Scientific/Statistical Databases (SSDBS) exhibit many specialized data usage and characteristics ([Shoshani,Olken,Wong84], [Wong84]). Despite the advent of many advanced access methods, the dominant file structure for very large SSDBs is still the simple sequential file. The major reason is a " mismatch " between conventional access methods such as inverted files, B-trees, hashing, etc. and the characteristics of SSDBs. First, since the cardinality of SSDBs attributes is typically small, most access methods simply partition the database into a small number of still very large files, with prohibitively expensive overhead for the pointers, structures, tables, etc., with only limited selective power added. Second, since SSDBs are largely static, the expensive overhead associated with the dynamic facilities of most access methods is not justified. Third, the values of SSDBs attributes tend to cluster, and current access methods often do not take advantage of this opportunity for compression. Fourth, the access to SSDBs is typically long " sweep " i.e., a long sequence of individual records is fetched and a small number of attributes extracted. This kind of range access is not supported well by most access methods. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copy. ing is by permission of the Very Large …
منابع مشابه
Star Query Plans in SADAS: A Data Warehouse System Based on Transposed Files
Abstract The main commercial data warehouse systems available today are based on record-oriented relational technology optimized for OLTP applications. Several authors have shown that substantial improvements in query performance for OLAP applications can be achieved by systems based on transposed files (column-oriented) technology, since the dominant queries require grouping and aggregation on...
متن کاملTransposed Form of Folded Fir Filter
The designing method of folded finite-impulse response (FIR) filter on pipelined array based multiplier arrays is presented in this paper. The design is considered at the bit-level of the pipelined multiplier array and internal delays are fully exploited in order to reduce power consumption and hardware complexity, transposed FIR filter forms is considered. The proposed schemes are compared as ...
متن کاملPEDICLE MUSCLE FLAPS IN IRRADIATED WOUNDS: DOES PREVIOUS RADIATION OF THE MUSCLE TO BE TRANSPOSED-AFFECT THE OUTCOME? A 9 YEARS EXPERIENCE OVER 206 CONSECUTIVE CASES
Radiation-related wounds challenge surgeons in all disciplines of surgery. Wound-healing complications are commonplace, and solutions for reconstruction are limited. Muscle and musculocutaneous flaps have improved this situation. But the question is, does previous radiation of the muscle to be transposed affect the outcome? 143 consecutive previously irradiated patients treated with muscle...
متن کاملAn Alternative Arrangement of Symmetric Datasets for Vertical Clustering Algorithms
A symmetric dataset is defined as an n x n dataset that when transposed, it is equal to that of prior transposed. In data mining algorithms that employ vertical data structure, symmetric datasets are used, for example, in clustering gene expression pattern and density-based clustering algorithms. In the former, a symmetric dataset is the pre-computed pairwise similarity of genes, while in the l...
متن کاملCompression of Unicode Files
The increasing importance of Unicode for text files, for example with Java and in some modern operating systems, implies a possible doubling of data storage space and data transmission time, with a corresponding need for data compression. However it is not clear that data compressors designed for 8-bit byte data are well matched to 16-bit Unicode data. This paper investigates the compression of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1985